complexity measure
Lifelong Learning with Weighted Majority Votes
Better understanding of the potential benefits of information transfer and representation learning is an important step towards the goal of building intelligent systems that are able to persist in the world and learn over time. In this work, we consider a setting where the learner encounters a stream of tasks but is able to retain only limited information from each encountered task, such as a learned predictor. In contrast to most previous works analyzing this scenario, we do not make any distributional assumptions on the task generating process. Instead, we formulate a complexity measure that captures the diversity of the observed tasks. We provide a lifelong learning algorithm with error guarantees for every observed task (rather than on average). We show sample complexity reductions in comparison to solving every task in isolation in terms of our task complexity measure. Further, our algorithmic framework can naturally be viewed as learning a representation from encountered tasks with a neural network.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)
ff1418e8cc993fe8abcfe3ce2003e5c5-Supplemental.pdf
The table ( right) shows 100 epoch results using best lr and wd values found at 50 epochs. ViT's patchify stem differs from the proposed convolutional stem in the type of convolution used and We investigate these factors next. The focus of this paper is studying the large, positive impact of changing ViT's default We use AdamW for all experiments. Figure 7 shows the results. The table ( right) shows 100 epoch results using optimal lr and wd values chosen from the 50 epoch runs.
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- (2 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Indiana > Jackson County > Seymour (0.04)